NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FuncFetch: an LLM-assisted workflow enables mining thousands of enzyme–substrate interactions from published manuscripts

https://doi.org/10.1093/bioinformatics/btae756

Smith, Nathaniel; Yuan, Xinyu; Melissinos, Chesney; Moghe, Gaurav; Wren, ed., Jonathan (December 2024, Bioinformatics)

Abstract MotivationThousands of genomes are publicly available, however, most genes in those genomes have poorly defined functions. This is partly due to a gap between previously published, experimentally characterized protein activities and activities deposited in databases. This activity deposition is bottlenecked by the time-consuming biocuration process. The emergence of large language models presents an opportunity to speed up the text-mining of protein activities for biocuration. ResultsWe developed FuncFetch—a workflow that integrates NCBI E-Utilities, OpenAI’s GPT-4, and Zotero—to screen thousands of manuscripts and extract enzyme activities. Extensive validation revealed high precision and recall of GPT-4 in determining whether the abstract of a given paper indicates the presence of a characterized enzyme activity in that paper. Provided the manuscript, FuncFetch extracted data such as species information, enzyme names, sequence identifiers, substrates, and products, which were subjected to extensive quality analyses. Comparison of this workflow against a manually curated dataset of BAHD acyltransferase activities demonstrated a precision/recall of 0.86/0.64 in extracting substrates. We further deployed FuncFetch on nine large plant enzyme families. Screening 26 543 papers, FuncFetch retrieved 32 605 entries from 5459 selected papers. We also identified multiple extraction errors including incorrect associations, nontarget enzymes, and hallucinations, which highlight the need for further manual curation. The BAHD activities were verified, resulting in a comprehensive functional fingerprint of this family and revealing that ∼70% of the experimentally characterized enzymes are uncurated in the public domain. FuncFetch represents an advance in biocuration and lays the groundwork for predicting the functions of uncharacterized enzymes. Availability and implementationCode and minimally curated activities are available at: https://github.com/moghelab/funcfetch and https://tools.moghelab.org/funczymedb.
more » « less
Investigating ocean circulation dynamics through data assimilation: A mathematical study using the Stommel box model with rapid oscillatory forcings

https://doi.org/10.1063/5.0215236

Smith, Nathaniel; Shiney-Ajay, Anvaya; Fleurantin, Emmanuel; Pasmans, Ivo (October 2024, Chaos: An Interdisciplinary Journal of Nonlinear Science)

We investigate ocean circulation changes through the lens of data assimilation using a reduced-order model. Our primary interest lies in the Stommel box model, which reveals itself to be one of the most practicable models that has the ability of reproducing the meridional overturning circulation. The Stommel box model has at most two regimes: TH (temperature driven circulation with sinking near the north pole) and SA (salinity driven with sinking near the equator). Currently, the meridional overturning is in the TH regime. Using box-averaged Met Office EN4 ocean temperature and salinity data, our goal is to provide a probability that a future regime change occurs and establish how this probability depends on the uncertainties in initial conditions, parameters, and forcings. We will achieve this using data assimilation tools and DAPPER within the Stommel box model with fast oscillatory regimes.
more » « less
Full Text Available
Disruption and recovery of reaction–diffusion wavefronts interacting with concave, fractal, and soft obstacles

https://doi.org/10.1016/j.physa.2020.125536

Yu, Yang F.; Fuller, Chase A.; McGuire, Margaret K.; Glaser, Rebecca; Smith, Nathaniel J.; Manz, Niklas; Lindner, John F. (March 2021, Physica A: Statistical Mechanics and its Applications)
null (Ed.)
Full Text Available

Search for: All records